Document Style Recognition Using Shallow Statistical Analysis

نویسنده

  • Pavel Braslavski
چکیده

Documents differ not only in topic but also in style. Style is a very broad and ambiguous term used in arts, fashion, literary criticism, and linguistics. In case of text documents we can accept an intuitive understanding that style is mainly related to the form (how) whereas topic – to the content (what) of a document. Although some topics determine strictly the style can be used, most topics allow their expression in various styles. Thus, style can be considered to be orthogonal to topic in a certain sense. Therefore style can be assumed to be a useful parameter in many text processing and information retrieval tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Shallow Description Framework for Musical Style Recognition

In the field of computer music, pattern recognition algorithms are very relevant for music information retrieval (MIR). One challenging task within this area is the automatic recognition of musical style, that has a number of applications like indexing and selecting musical databases. In this paper, the classification of monophonic melodies of two different musical styles (jazz and classical) r...

متن کامل

Problems and Approaches for Oriental Document Analysis

Machine understanding of hand,filled documents in China, Japan and Korea requires not only general solutions of document analysis but also ability to handle peculiarities of the Oriental languages. As expected, handwritten Chinese character recognition is the major task for it. In addition, Japanese Kana, Korean Hangul, Roman alphabet as well as numerals are targets of recognition. The main dif...

متن کامل

Style Recognition through Statistical Event Models

The automatic classification of music fragments into styles is one challenging problem within the music information retrieval (MIR) domain and also for the understanding of music style perception. This has a number of applications, including the indexation and exploration of music databases. Some technologies employed in text classification can be applied to this problem. The key point here is ...

متن کامل

Automatic transformation of lecture transcription into document style using statistical framework

This paper addresses automatic transformation from spoken style texts to written style texts. Exact transcriptions and speech recognition results of live lectures include many spoken language expressions, and thus, are not suitable for documents and need to be edited. In this paper, we present a method of applying of the statistical approach used in machine translation to this post-processing t...

متن کامل

Named Entity Recognition for Web Content Filtering

Effective Web content filtering is a necessity in educational and workplace environments, but current approaches are far from perfect. We discuss a model for text-based intelligent Web content filtering, in which shallow linguistic analysis plays a key role. In order to demonstrate how this model can be realized, we have developed a lexical Named Entity Recognition system, and used it to improv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004